Take-Home Exercise 3

Take-Home
Code
Author

Kwee Cheng

Published

October 31, 2024

Modified

November 3, 2024

Overview and Objectives

In this take-home my aim is to evaluate the necessary R packages necessary to perform:

  • Global Measures of Spatial Autocorrelation

  • Local Measures of Spatial Autocorrelation

This is to be done on the data which is the different types of crimes in Malaysia on the district level which we would layer with income inequality of Malaysia.

This also serves to prototype the Shiny application UI and choosing the right type of components

Data

Packages

  • sf provides a standardised way to work with spatial vector data (points, lines, polygons)

  • spdep focuses on spatial econometrics and spatial statistics

  • tmap create thematic maps

  • tidyverse for easy data manipulation and some visualisation

  • knitr facilitates the integration of R code and documentation in reproducible research reports

pacman::p_load(sf, sfdep, tmap, tidyverse, knitr)

Importing the Data

Before the UI prototyping can be done let’s see what type of data we are dealing with so that we can better plan for the UI components to be used

Let’s import the crime dataset

crime <- read_csv("data/aspatial/crime_district.csv")
Rows: 19152 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): state, district, category, type
dbl  (1): crimes
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s import the income inequality as well

income <- read_csv("data/aspatial/hh_inequality.csv")
Rows: 318 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): state, district
dbl  (1): gini
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Also import the annual principal labour force statistics

labour <- read_csv("data/aspatial/lfs_district.csv")
Rows: 560 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): state, district
dbl  (7): lf, lf_employed, lf_unemployed, lf_outside, p_rate, u_rate, ep_ratio
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
crime
# A tibble: 19,152 × 6
   state    district category type           date       crimes
   <chr>    <chr>    <chr>    <chr>          <date>      <dbl>
 1 Malaysia All      assault  all            2016-01-01  22327
 2 Malaysia All      assault  all            2017-01-01  21366
 3 Malaysia All      assault  all            2018-01-01  16902
 4 Malaysia All      assault  all            2019-01-01  16489
 5 Malaysia All      assault  all            2020-01-01  13279
 6 Malaysia All      assault  all            2021-01-01  11495
 7 Malaysia All      assault  all            2022-01-01  10348
 8 Malaysia All      assault  all            2023-01-01  10453
 9 Malaysia All      assault  causing_injury 2016-01-01   5531
10 Malaysia All      assault  causing_injury 2017-01-01   5024
# ℹ 19,142 more rows

From here we can identify some of the variables that we can use, that the user can interact with district, category, type, date, crimes

income
# A tibble: 318 × 4
   state district    date        gini
   <chr> <chr>       <date>     <dbl>
 1 Johor Batu Pahat  2019-01-01 0.295
 2 Johor Batu Pahat  2022-01-01 0.338
 3 Johor Johor Bahru 2019-01-01 0.388
 4 Johor Johor Bahru 2022-01-01 0.359
 5 Johor Kluang      2019-01-01 0.333
 6 Johor Kluang      2022-01-01 0.354
 7 Johor Kota Tinggi 2019-01-01 0.361
 8 Johor Kota Tinggi 2022-01-01 0.343
 9 Johor Kulai       2019-01-01 0.324
10 Johor Kulai       2022-01-01 0.337
# ℹ 308 more rows

Likewise for income we have district, date, gini

labour
# A tibble: 560 × 10
   state district   date          lf lf_employed lf_unemployed lf_outside p_rate
   <chr> <chr>      <date>     <dbl>       <dbl>         <dbl>      <dbl>  <dbl>
 1 Johor Batu Pahat 2019-01-01  214.        210.           3.7       90.4   70.3
 2 Johor Batu Pahat 2020-01-01  219.        214.           4.7       92.2   70.4
 3 Johor Batu Pahat 2021-01-01  216         211            5         97.1   69  
 4 Johor Batu Pahat 2022-01-01  221.        217.           4         94.6   70  
 5 Johor Johor Bah… 2019-01-01  792.        768.          24.6      294.    73  
 6 Johor Johor Bah… 2020-01-01  804.        771.          32.5      297.    73  
 7 Johor Johor Bah… 2021-01-01  806.        772.          34.1      298.    73  
 8 Johor Johor Bah… 2022-01-01  830.        799           30.5      293.    73.9
 9 Johor Kluang     2019-01-01  166.        160.           5.6       66.9   71.3
10 Johor Kluang     2020-01-01  168.        161.           7.1       67.4   71.4
# ℹ 550 more rows
# ℹ 2 more variables: u_rate <dbl>, ep_ratio <dbl>

For labour we have district, date, lf, lf_employed, lf_unemployed, lf_outside, p_rate, u_rate, ep_ration

UI Design

For a shiny application in this course we work with three main components headerPanel, sidebarPanel, and mainPanel.

  • Header Panel : This is the topmost part of the UI where we can put a description of the application or have a navbar where you can navigate different pages. Each page leads to other group members work/part in this project

  • Sidebar Panel: This panel would mainly consist of the input controls that the user can play around with to change the map output in the Main Panel.

  • Main Panel : This is the primary area of the application and it typically contains outputs. The main panel displays the output (like maps, plots, tables, etc.) based on the input given in the sidebar panel.

Header Panel

For this we would like to put navbarPage() which shiny provides. This is so as to keep our project organised and it would be easier to navigate through the different pages that we would have

Side Panel

For this part it would be the input controls and given the potential variables the the data type we have identified: district, category, type, date, crimes, gini.

Some of the potential input controls that could be used are:

Something that our side panel that could look like given the variables that we are given:

Another possibility is to have the filters in a card look for the different sections of the plot e.g. global and local moran would be grouped together in their own card and EHSA would get their own card

Main Panel

Given that I am working with LISA maps and that having a comparison between two maps would be helpful for the user to visualise.

Another thing to consider would be is to have a tabset, for if I want to showcase more maps. So that it will be easier for the user to navigate.

This would also be roughly how our shiny application would look like with the different layouts

Data Wrangling

Looking at the crime csv file there are rows with “all” or “All” as the data. This seems to be a summary of the different crimes or summary for the different districts for the different years. So let’s remove the them

excluded_column <- "date"
crime <- crime[!apply(crime[, !names(crime) %in% excluded_column] == "all", 1, any), ]
crime <- crime[!apply(crime[, !names(crime) %in% excluded_column] == "All", 1, any), ]

Let’s also add a column called year to the different csv files, to that it would be easier to split up the data into the different years

crime <- crime %>%
              mutate(year = year(date))

income <- income %>%
              mutate(year = year(date))

labour <- labour %>%
              mutate(year = year(date))

Let’s load Malaysia shape file and transform the crs into Malaysia

msia_sf <- read_sf(dsn = "data/geospatial/mys_adm_unhcr_20210211_shp", 
                 layer = "mys_admbnda_adm2_unhcr_20210211") %>%
  st_as_sf(coords =c(
    "longitude", "latitude"),
           crs = 4326) %>%
  st_transform(crs = 3168)
st_crs(msia_sf)
Coordinate Reference System:
  User input: EPSG:3168 
  wkt:
PROJCRS["Kertau (RSO) / RSO Malaya (m)",
    BASEGEOGCRS["Kertau (RSO)",
        DATUM["Kertau (RSO)",
            ELLIPSOID["Everest 1830 (RSO 1969)",6377295.664,300.8017,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4751]],
    CONVERSION["Rectified Skew Orthomorphic Malaya Grid (metre)",
        METHOD["Hotine Oblique Mercator (variant A)",
            ID["EPSG",9812]],
        PARAMETER["Latitude of projection centre",4,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8811]],
        PARAMETER["Longitude of projection centre",102.25,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8812]],
        PARAMETER["Azimuth of initial line",323.0257905,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8813]],
        PARAMETER["Angle from Rectified to Skew Grid",323.130102361111,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8814]],
        PARAMETER["Scale factor on initial line",0.99984,
            SCALEUNIT["unity",1],
            ID["EPSG",8815]],
        PARAMETER["False easting",804670.24,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Malaysia - West Malaysia onshore."],
        BBOX[1.21,99.59,6.72,104.6]],
    ID["EPSG",3168]]

Hole in boundary file

Next check if there are any holes with the boundary file

u_msia <- st_union(msia_sf)
plot(u_msia)

Missing row

Let’s do a check if there are any missing values in the crime data

na <- crime %>%
  summarise(na_district = sum(is.na(district)),
            na_category = sum(is.na(category)),
            na_type = sum(is.na(type)),
            na_date = sum(is.na(date)),
            na_crimes = sum(is.na(crimes))
            )
print(na)
# A tibble: 1 × 5
  na_district na_category na_type na_date na_crimes
        <int>       <int>   <int>   <int>     <int>
1           0           0       0       0         0

Let’s also do a check for the income inequality data

na <- income %>%
  summarise(na_district = sum(is.na(district)),
            na_date = sum(is.na(date)),
            na_gini = sum(is.na(gini))
            )
print(na)
# A tibble: 1 × 3
  na_district na_date na_gini
        <int>   <int>   <int>
1           0       0       0

And also for the labour data

na <- labour %>%
  summarise(na_district = sum(is.na(district)),
            na_date = sum(is.na(date)),
            na_lf = sum(is.na(lf)),
            na_lf_unemployed = sum(is.na(lf_unemployed)),
            na_u_rate = sum(is.na(u_rate)),
            )
print(na)
# A tibble: 1 × 5
  na_district na_date na_lf na_lf_unemployed na_u_rate
        <int>   <int> <int>            <int>     <int>
1           0       0     0                0         0

Left Join

Mismatch Districts

Having check everything else, let’s check whether is there any issues with msia_sf and crime

combined_data <- bind_cols(crime = sort(unique(crime$district)), msia_sf = sort(unique(msia_sf$ADM2_EN)))

# Create a new column to compare the values
combined_data <- combined_data %>%
  mutate(same_values = crime == msia_sf) %>% filter(same_values == FALSE)

# View the result
combined_data

This would generate an error regarding difference in the number of data, in the crime there are 159 districts while in msia_sf there are 144 districts.

Let’s run another code to see the difference

crime_unique <- data.frame(district = sort(unique(crime$district)))
msia_unique <- data.frame(ADM2_EN = sort(unique(msia_sf$ADM2_EN)))

# Find rows in crime_unique that don't have a match in msia_unique
unmatched_crime <- anti_join(crime_unique, msia_unique, by = c("district" = "ADM2_EN"))

# Find rows in msia_unique that don't have a match in crime_unique
unmatched_msia <- anti_join(msia_unique, crime_unique, by = c("ADM2_EN" = "district"))

# Combine results to see all mismatches

unmatched_crime
                 district
1             Ampang Jaya
2                    Arau
3            Bandar Bharu
4              Batu Gajah
5             Brickfields
6        Cameron Highland
7                  Cheras
8              Dang Wangi
9                   Gerik
10          Hulu Selangor
11                   Ipoh
12        Iskandar Puteri
13    Johor Bahru Selatan
14      Johor Bahru Utara
15                 Kajang
16                 Kangar
17          Klang Selatan
18            Klang Utara
19      Kota Kinabatangan
20         Kota Samarahan
21            Kuala Lipis
22                Manjung
23              Matu Daro
24                  Nilai
25               Nusajaya
26           Padang Besar
27                Padawan
28         Pengkalan Hulu
29          Petaling Jaya
30 Seberang Perai Selatan
31  Seberang Perai Tengah
32   Seberang Perai Utara
33                 Selama
34                 Sentul
35                Serdang
36              Seri Alam
37              Sg. Buloh
38              Shah Alam
39            Subang Jaya
40           Sungai Buloh
41           Sungai Siput
42                Taiping
43          Tanjong Malim
44                  Tapah
45            Wangsa Maju
unmatched_msia
             ADM2_EN
1            Asajaya
2      Batang Padang
3               Daro
4        Johor Bahru
5       Kinabatangan
6              Kinta
7              Klang
8        Kuala Penyu
9   Larut Dan Matang
10             Lipis
11 Manjung (Dinding)
12              Matu
13           Nabawan
14             Pakan
15            Perlis
16          Petaling
17             Pitas
18        Pokok Sena
19           Putatan
20       S.P. Tengah
21        S.P. Utara
22       S.P.Selatan
23         Samarahan
24          Selangau
25          Tambunan
26            Tongod
27        Ulu Langat
28         Ulu Perak
29      Ulu Selangor
30  WP. Kuala Lumpur

From here we can actually see which data is missing in which file

Let’s see all the unique districts in the sf file

sort(unique(msia_sf$ADM2_EN))
  [1] "Alor Gajah"        "Asajaya"           "Bachok"           
  [4] "Baling"            "Bandar Baharu"     "Barat Daya"       
  [7] "Batang Padang"     "Batu Pahat"        "Bau"              
 [10] "Beaufort"          "Belaga"            "Beluran"          
 [13] "Bentong"           "Bera"              "Besut"            
 [16] "Betong"            "Bintulu"           "Cameron Highlands"
 [19] "Dalat"             "Daro"              "Dungun"           
 [22] "Gombak"            "Gua Musang"        "Hilir Perak"      
 [25] "Hulu Terengganu"   "Jasin"             "Jelebu"           
 [28] "Jeli"              "Jempol"            "Jerantut"         
 [31] "Johor Bahru"       "Julau"             "Kampar"           
 [34] "Kanowit"           "Kapit"             "Kemaman"          
 [37] "Keningau"          "Kerian"            "Kinabatangan"     
 [40] "Kinta"             "Klang"             "Kluang"           
 [43] "Kota Belud"        "Kota Bharu"        "Kota Kinabalu"    
 [46] "Kota Marudu"       "Kota Setar"        "Kota Tinggi"      
 [49] "Kuala Kangsar"     "Kuala Krai"        "Kuala Langat"     
 [52] "Kuala Muda"        "Kuala Penyu"       "Kuala Pilah"      
 [55] "Kuala Selangor"    "Kuala Terengganu"  "Kuantan"          
 [58] "Kubang Pasu"       "Kuching"           "Kudat"            
 [61] "Kulaijaya"         "Kulim"             "Kunak"            
 [64] "Lahad Datu"        "Langkawi"          "Larut Dan Matang" 
 [67] "Lawas"             "Ledang"            "Limbang"          
 [70] "Lipis"             "Lubok Antu"        "Lundu"            
 [73] "Machang"           "Manjung (Dinding)" "Maran"            
 [76] "Marang"            "Marudi"            "Matu"             
 [79] "Melaka Tengah"     "Meradong"          "Mersing"          
 [82] "Miri"              "Muar"              "Mukah"            
 [85] "Nabawan"           "Padang Terap"      "Pakan"            
 [88] "Papar"             "Pasir Mas"         "Pasir Puteh"      
 [91] "Pekan"             "Penampang"         "Pendang"          
 [94] "Perak Tengah"      "Perlis"            "Petaling"         
 [97] "Pitas"             "Pokok Sena"        "Pontian"          
[100] "Port Dickson"      "Putatan"           "Ranau"            
[103] "Raub"              "Rembau"            "Rompin"           
[106] "S.P. Tengah"       "S.P. Utara"        "S.P.Selatan"      
[109] "Sabak Bernam"      "Samarahan"         "Sandakan"         
[112] "Saratok"           "Sarikei"           "Segamat"          
[115] "Selangau"          "Semporna"          "Sepang"           
[118] "Seremban"          "Serian"            "Setiu"            
[121] "Sibu"              "Sik"               "Simunjan"         
[124] "Sipitang"          "Song"              "Sri Aman"         
[127] "Tambunan"          "Tampin"            "Tanah Merah"      
[130] "Tatau"             "Tawau"             "Temerloh"         
[133] "Tenom"             "Timur Laut"        "Tongod"           
[136] "Tuaran"            "Tumpat"            "Ulu Langat"       
[139] "Ulu Perak"         "Ulu Selangor"      "W.P. Labuan"      
[142] "W.P. Putrajaya"    "WP. Kuala Lumpur"  "Yan"              

From here there is no easy way to fix this but to google the districts mentioned in crime and try to map it as close as close to the district in the sf file

crime <- crime %>%
  mutate(district = recode(district,
                           # Johor Bahru mappings
                           "Iskandar Puteri" = "Johor Bahru",
                           "Nusajaya" = "Johor Bahru",
                           "Johor Bahru Selatan" = "Johor Bahru",
                           "Johor Bahru Utara" = "Johor Bahru",
                           "Seri Alam" = "Johor Bahru",
                           
                           # Bandar Baharu correction
                           "Bandar Bharu" = "Bandar Baharu",
                           
                           # WP Kuala Lumpur mappings
                           "Brickfields" = "WP. Kuala Lumpur",
                           "Cheras" = "WP. Kuala Lumpur",
                           "Dang Wangi" = "WP. Kuala Lumpur",
                           "Sentul" = "WP. Kuala Lumpur",
                           "Wangsa Maju" = "WP. Kuala Lumpur",
                           
                           # Seremban correction
                           "Nilai" = "Seremban",
                           
                           # Seberang Perai corrections
                           "Seberang Perai Selatan" = "S.P.Selatan",
                           "Seberang Perai Tengah" = "S.P. Tengah",
                           "Seberang Perai Utara" = "S.P. Utara",
                           
                           # Cameron Highlands correction
                           "Cameron Highland" = "Cameron Highlands",
                           
                           # Lipis correction
                           "Kuala Lipis" = "Lipis",
                           
                           # Kinta mappings
                           "Batu Gajah" = "Kinta",
                           "Ipoh" = "Kinta",
                           
                           # Ulu Perak mappings
                           "Gerik" = "Ulu Perak",
                           "Pengkalan Hulu" = "Ulu Perak",
      
                           
                           # Manjung correction
                           "Manjung" = "Manjung (Dinding)",
                           
                           # Larut Dan Matang mappings
                           "Selama" = "Larut Dan Matang",
                           "Taiping" = "Larut Dan Matang",
                           
                           # Kuala Kangsar correction
                           "Sungai Siput" = "Kuala Kangsar",
                           
                           # Batang Padang mappings
                           "Tanjong Malim" = "Batang Padang",
                           "Tapah" = "Batang Padang",
                           
                           # Perlis mappings
                           "Arau" = "Perlis",
                           "Kangar" = "Perlis",
                           "Padang Besar" = "Perlis",
                           
                           # Kinabatangan correction
                           "Kota Kinabatangan" = "Kinabatangan",
                           
                           # Samarahan correction
                           "Kota Samarahan" = "Samarahan",
                           
                           # Mukah correction
                           "Matu Daro" = "Mukah",
                           
                           # Kuching correction
                           "Padawan" = "Kuching",
                           
                           # Gombak correction
                           "Ampang Jaya" = "Gombak",
                           
                           # Ulu Langat correction
                           "Kajang" = "Ulu Langat",
                           
                           # Ulu Selangor correction
                           "Hulu Selangor" = "Ulu Selangor",
                           
                           # Klang mappings
                           "Klang Selatan" = "Klang",
                           "Klang Utara" = "Klang",
                           
                           # Petaling mappings
                           "Petaling Jaya" = "Petaling",
                           "Serdang" = "Petaling",
                           "Sg. Buloh" = "Petaling",
                           "Shah Alam" = "Petaling",
                           "Subang Jaya" = "Petaling",
                           "Sungai Buloh" = "Petaling",
                           
                           # Default to keep original name if no match
                           .default = district))

let’s check again to see if altered the data correctly

crime_unique <- data.frame(district = sort(unique(crime$district)))

# Find rows in crime_unique that don't have a match in msia_unique
unmatched_crime <- anti_join(crime_unique, msia_unique, by = c("district" = "ADM2_EN"))

unmatched_crime
[1] district
<0 rows> (or 0-length row.names)

As we plan to overlay with the labour data, let’s do checks for that as well

labour_unique <- data.frame(district = sort(unique(labour$district)))
msia_unique <- data.frame(ADM2_EN = sort(unique(msia_sf$ADM2_EN)))

# Find rows in crime_unique that don't have a match in msia_unique
unmatched_labour <- anti_join(labour_unique, msia_unique, by = c("district" = "ADM2_EN"))

# Find rows in msia_unique that don't have a match in crime_unique
unmatched_msia <- anti_join(msia_unique, labour_unique, by = c("ADM2_EN" = "district"))

# Combine results to see all mismatches

unmatched_labour
                district
1             Hulu Perak
2                  Kulai
3                Manjung
4               Maradong
5 Seberang Perai Selatan
6  Seberang Perai Tengah
7   Seberang Perai Utara
8                Tangkak
unmatched_msia
             ADM2_EN
1          Kulaijaya
2             Ledang
3  Manjung (Dinding)
4           Meradong
5             Perlis
6        S.P. Tengah
7         S.P. Utara
8        S.P.Selatan
9          Ulu Perak
10       W.P. Labuan
11    W.P. Putrajaya
12  WP. Kuala Lumpur

Let’s change the districts in labour like what we did for crime

labour <- labour %>%
  mutate(district = recode(district,
                           "Kulai" = "Kulaijaya",
                           # Seberang Perai corrections
                           "Seberang Perai Selatan" = "S.P.Selatan",
                           "Seberang Perai Tengah" = "S.P. Tengah",
                           "Seberang Perai Utara" = "S.P. Utara",
                           
                           # Ulu Perak mappings
                           "Hulu Perak" = "Ulu Perak",
                           
                           # Manjung correction
                           "Manjung" = "Manjung (Dinding)",
                           
                           "Maradong" = "Meradong",
                           "Tangkak" = "Ledang",
                           
                           # Default to keep original name if no match
                           .default = district))

Let’s check if there are still any issues with the district for labour

labour_unique <- data.frame(district = sort(unique(labour$district)))
msia_unique <- data.frame(ADM2_EN = sort(unique(msia_sf$ADM2_EN)))

# Find rows in crime_unique that don't have a match in msia_unique
unmatched_labour <- anti_join(labour_unique, msia_unique, by = c("district" = "ADM2_EN"))

unmatched_labour
[1] district
<0 rows> (or 0-length row.names)

Let’s combine our labour data with our crimes data

crime_labour <- crime %>%
        filter(year >= 2019 & year <= 2022) %>%
        left_join(labour, by = c("district","year")) %>%
        select(1:4,6,7,10,12,15)

Let’s check for any empty rows before left_join

na <- crime_labour %>%
  summarise(na_district = sum(is.na(district)),
            na_category = sum(is.na(category)),
            na_type = sum(is.na(type)),
            na_crimes = sum(is.na(crimes)),
            na_year = sum(is.na(year)),
            na_lf = sum(is.na(lf)),
            na_lf_unemployed = sum(is.na(lf_unemployed)),
            na_u_rate = sum(is.na(u_rate)),
            )
print(na)
# A tibble: 1 × 8
  na_district na_category na_type na_crimes na_year na_lf na_lf_unemployed
        <int>       <int>   <int>     <int>   <int> <int>            <int>
1           0           0       0         0       0   480              480
# ℹ 1 more variable: na_u_rate <int>

There are NA values so let’s remove them

crime_labour <- na.omit(crime_labour)

Do another check

na <- crime_labour %>%
  summarise(na_district = sum(is.na(district)),
            na_category = sum(is.na(category)),
            na_type = sum(is.na(type)),
            na_crimes = sum(is.na(crimes)),
            na_year = sum(is.na(year)),
            na_lf = sum(is.na(lf)),
            na_lf_unemployed = sum(is.na(lf_unemployed)),
            na_u_rate = sum(is.na(u_rate)),
            )
print(na)
# A tibble: 1 × 8
  na_district na_category na_type na_crimes na_year na_lf na_lf_unemployed
        <int>       <int>   <int>     <int>   <int> <int>            <int>
1           0           0       0         0       0     0                0
# ℹ 1 more variable: na_u_rate <int>

Finally with combine it with our msia_sf

msia <- left_join(msia_sf,crime_labour, by = c("ADM2_EN" = "district")) %>%
        select(1,6,16:23)

msia
Simple feature collection with 7024 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 184853.1 ymin: 94420.8 xmax: 2380932 ymax: 829136
Projected CRS: Kertau (RSO) / RSO Malaya (m)
# A tibble: 7,024 × 11
   ADM2_EN    ADM1_EN state.x category type     crimes  year    lf lf_unemployed
   <chr>      <chr>   <chr>   <chr>    <chr>     <dbl> <dbl> <dbl>         <dbl>
 1 Batu Pahat Johor   Johor   assault  causing…     41  2019  214.           3.7
 2 Batu Pahat Johor   Johor   assault  causing…     43  2020  219.           4.7
 3 Batu Pahat Johor   Johor   assault  causing…     22  2021  216            5  
 4 Batu Pahat Johor   Johor   assault  causing…     19  2022  221.           4  
 5 Batu Pahat Johor   Johor   assault  murder        3  2019  214.           3.7
 6 Batu Pahat Johor   Johor   assault  murder        3  2020  219.           4.7
 7 Batu Pahat Johor   Johor   assault  murder        0  2021  216            5  
 8 Batu Pahat Johor   Johor   assault  murder        3  2022  221.           4  
 9 Batu Pahat Johor   Johor   assault  rape         29  2019  214.           3.7
10 Batu Pahat Johor   Johor   assault  rape         16  2020  219.           4.7
# ℹ 7,014 more rows
# ℹ 2 more variables: u_rate <dbl>, geometry <MULTIPOLYGON [m]>

NA Values

Looking at this we could see some additional rows have been added. Let’s see if there are any NA values

na <- msia %>%
  summarise(na_district = sum(is.na(ADM2_EN)),
            na_category = sum(is.na(category)),
            na_type = sum(is.na(type)),
            na_crimes = sum(is.na(crimes)),
            na_year = sum(is.na(year)),
            na_lf = sum(is.na(lf)),
            na_lf_unemployed = sum(is.na(lf_unemployed)),
            na_u_rate = sum(is.na(u_rate)),
            )
print(na)
Simple feature collection with 1 feature and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 184853.1 ymin: 94420.8 xmax: 2380932 ymax: 829136
Projected CRS: Kertau (RSO) / RSO Malaya (m)
# A tibble: 1 × 9
  na_district na_category na_type na_crimes na_year na_lf na_lf_unemployed
        <int>       <int>   <int>     <int>   <int> <int>            <int>
1           0          16      16        16      16    16               16
# ℹ 2 more variables: na_u_rate <int>, geometry <MULTIPOLYGON [m]>

Let’s remove the NA rows

msia <- na.omit(msia)

Do another check

na <- msia %>%
  summarise(na_district = sum(is.na(ADM2_EN)),
            na_category = sum(is.na(category)),
            na_type = sum(is.na(type)),
            na_crimes = sum(is.na(crimes)),
            na_year = sum(is.na(year)),
            na_lf = sum(is.na(lf)),
            na_lf_unemployed = sum(is.na(lf_unemployed)),
            na_u_rate = sum(is.na(u_rate)),
            )
print(na)
Simple feature collection with 1 feature and 8 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 184853.1 ymin: 94420.8 xmax: 2380932 ymax: 829136
Projected CRS: Kertau (RSO) / RSO Malaya (m)
# A tibble: 1 × 9
  na_district na_category na_type na_crimes na_year na_lf na_lf_unemployed
        <int>       <int>   <int>     <int>   <int> <int>            <int>
1           0           0       0         0       0     0                0
# ℹ 2 more variables: na_u_rate <int>, geometry <MULTIPOLYGON [m]>

Let’s check for duplicates as well

duplicates <- msia %>%
    group_by(ADM2_EN, year, category, type, crimes) %>%
    filter(n() > 1)
if(nrow(duplicates) > 0) {
    print("Duplicate combinations found!")
    print(duplicates)
}
[1] "Duplicate combinations found!"
Simple feature collection with 288 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 284518.8 ymin: 116154.7 xmax: 1641930 ymax: 656488.5
Projected CRS: Kertau (RSO) / RSO Malaya (m)
# A tibble: 288 × 11
# Groups:   ADM2_EN, year, category, type, crimes [125]
   ADM2_EN     ADM1_EN state.x category type    crimes  year    lf lf_unemployed
 * <chr>       <chr>   <chr>   <chr>    <chr>    <dbl> <dbl> <dbl>         <dbl>
 1 Johor Bahru Johor   Johor   assault  murder       3  2020  804.          32.5
 2 Johor Bahru Johor   Johor   assault  murder       3  2022  830.          30.5
 3 Johor Bahru Johor   Johor   assault  robber…      0  2019  792.          24.6
 4 Johor Bahru Johor   Johor   assault  robber…      0  2020  804.          32.5
 5 Johor Bahru Johor   Johor   assault  robber…      0  2022  830.          30.5
 6 Johor Bahru Johor   Johor   assault  robber…      0  2019  792.          24.6
 7 Johor Bahru Johor   Johor   assault  robber…      0  2020  804.          32.5
 8 Johor Bahru Johor   Johor   assault  robber…      0  2021  806.          34.1
 9 Johor Bahru Johor   Johor   assault  robber…      0  2022  830.          30.5
10 Johor Bahru Johor   Johor   property theft_…      5  2022  830.          30.5
# ℹ 278 more rows
# ℹ 2 more variables: u_rate <dbl>, geometry <MULTIPOLYGON [m]>

Global Measures of Spatial Autocorrelation

Calculating Neighbours and Weights

I would be defining neighbour’s based on Queens contiguity, and also let’s assign spatial weights to each neighbouring polygon

msia_nb_q <- st_contiguity(msia, queen=TRUE)

As this takes time let’s write it to a rds file

write_rds(msia_nb_q, "data/rds/msia_nb_q.rds")

Computing Row-Standardised Weight Matrix

Next, we need to assign spatial weights to each neighboring polygon.

st_weights() function from sfdep pacakge can be used to supplement a neighbour list with spatial weights

msia_wm_rs <- st_weights(msia_nb_q, style="W")

We will mutate the newly created neighbour list object msia_nb_q and weight matrix msia_wm_rs into our existing msia. The result will be a new object, which we will call wm_q

wm_q <- msia %>%
  mutate(nb = msia_nb_q,
         wt = msia_wm_rs,
         .before = 1) 

Global Moran’s I Test

To assess spatial autocorrelation in our dataset, or how the presence of crimes in a district may form clusters.

global_moran_test(wm_q$crimes,
                  wm_q$nb,
                  wm_q$wt,
                  zero.policy = TRUE,
                  na.action=na.omit)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 118.83, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     1.240600e-01     -1.427144e-04      1.092536e-06 

From the test, the positive moran’s I statistic suggests that there is clustering, or a degree of spatial autocorrelation. This might be expected as when committing a crime, you might run to a neighbouring area. To commit another crime so as to be harder to track.

We can also see that the P-value is small. From a frequentist approach, we can see that this is unlikely to have occured by chance.

To strengthen our findings, we run a monte-carlo simulation.

set.seed(2932)

Global Moran’s I permutation test

global_moran_perm(wm_q$crimes,
           wm_q$nb,
           wm_q$wt,
           zero.policy = TRUE,
           nsim = 99,
           na.action=na.omit)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.12406, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

From the outputs above, we can observe that the Moran’s I statistic (after 1000 permutations) for the 0.12406 with a p-value < 2.2e-16 which is similar to our previous result with low p-value which suggest that it did not happen randomly.

Local Moran I

Local Indicators of Spatial Association, or LISA, let us evaluate clusters between districts. Where higher values denote that the region is more heavily influenced by its surroundings.

Calculating Local Moran I

Calculating local Moran’s I statistics and append the results to the original dataframe as new columns.

wm_q <- wm_q %>%
          mutate(local_moran = local_moran(
            crimes, nb, wt, nsim = 99, zero.policy=TRUE),
                 .before = 1) %>%
          unnest(local_moran)

Visualising Local Moran I

lisa <- wm_q
map1 <- tm_shape(lisa) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of No of crimes",
            main.title.size = 0.8)

map2 <- tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)
Variable(s) "ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

LISA

The local indicator of spatial association (LISA) for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation. LISA map is a categorical map showing type of outliers and clusters. There are two types of outliers namely: High-Low and Low-High outliers. Likewise, there are two type of clusters namely: High-High and Low-Low cluaters.

  • High-Low Outliers: Provinces with a high value of crimes, surrounded by neighbouring districts with low values of crimes.

  • Low-High Outliers: Provinces with a low value of crimes, surrounded by neighbouring districts with high values of crimes.

  • High-High Clusters: Provinces with a high value of crimes, surrounded by neighbouring districts with high values of crimes.

  • Low-Low Clusters: Provinces with a low value of crimes, surrounded by neighbouring districts with low values of crimes.

lisa_sig <- lisa %>% filter(p_ii_sim < 0.05)
tm_shape(lisa) +
  tm_polygons() + 
  tm_borders(alpha = 0.5) + 
  tm_shape(lisa_sig) + 
  tm_fill("mean", title = "LISA class") +
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "LISA map of crimes", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Emerging Hot Spot Analysis

Performing Emerging Hot Spot Analysis

EHSA we can—and likely should—incorporate the time-lag of our spatial neighbors. Secondly, there are classifications proposed by ESRI which help us understand how each location is changing over time. Both of these are handled by the emerging_hotspot_analysis() function.

This emerging_hotspot_analysis() takes a spacetime object x, and the quoted name of the variable of interested in .var at minimum. We can specify the number of time lags using the argument k which is set to 1 by default.

For this let’s create a st_data without the geometry and only doing it for assault for category and causing_injury for type

category_to_select <- c(
  "assault"
)
type_to_select <- c(
  "causing_injury"
)
msia_df <- msia %>% filter(category %in% category_to_select, type %in% type_to_select )
msia_df <- msia_df %>%
  select(year, crimes, ADM2_EN) %>%
  st_drop_geometry()
msia_sf_filtered <- msia_sf %>%
  semi_join(msia_df, by = "ADM2_EN")

Next is to create a spacetime object

msia_spt <- spacetime(msia_df, msia_sf,
                 .loc_col = "ADM2_EN",
                 .time_col = "year")

Let’s check if it is indeed a spacetime object

is_spacetime_cube(msia_spt)
! Number of rows does not equal `n time-periods x n locations`
[1] FALSE
msia_df <- msia_df %>%
    complete(year = unique(year),
             ADM2_EN = unique(ADM2_EN),
             fill = list(crimes = 0))  # Fill missing values with 0 or NA as needed

print(paste("Number of years:", length(unique(msia_df$year))))
[1] "Number of years: 4"
print(paste("Number of locations:", length(unique(msia_df$ADM2_EN))))
[1] "Number of locations: 128"
print(paste("Total rows:", nrow(msia_df)))
[1] "Total rows: 584"

For our space time we are suppose to get 128*4 = 512 rows but we are getting 584 rows. So there could be cases of duplications

duplicates <- msia_df %>%
    group_by(year, ADM2_EN) %>%
    filter(n() > 1)
if(nrow(duplicates) > 0) {
    print("Duplicate combinations found!")
    print(duplicates)
}
[1] "Duplicate combinations found!"
# A tibble: 120 × 3
# Groups:   year, ADM2_EN [48]
    year ADM2_EN       crimes
   <dbl> <chr>          <dbl>
 1  2019 Batang Padang     11
 2  2019 Batang Padang     14
 3  2019 Gombak            47
 4  2019 Gombak           117
 5  2019 Johor Bahru       46
 6  2019 Johor Bahru      126
 7  2019 Johor Bahru      140
 8  2019 Johor Bahru        0
 9  2019 Johor Bahru       50
10  2019 Kinta              3
# ℹ 110 more rows

Let’s remove them

msia_df <- msia_df %>%
    group_by(year, ADM2_EN) %>%
    summarise(crimes = mean(crimes, na.rm = TRUE)) %>%
    ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
check_duplicates <- msia_df %>%
    group_by(year, ADM2_EN) %>%
    filter(n() > 1)
print("Number of remaining duplicates:")
[1] "Number of remaining duplicates:"
print(nrow(check_duplicates))
[1] 0

Let’s create the spacetime object again

msia_spt <- spacetime(msia_df, msia_sf,
                 .loc_col = "ADM2_EN",
                 .time_col = "year")
is_spacetime_cube(msia_spt)
! Number of rows does not equal `n time-periods x n locations`
[1] FALSE

Another possible error could be that the msia_sf has more locations than what is in msia_df. Let’s check it

n_locations_df <- length(unique(msia_df$ADM2_EN))
n_locations_sf <- length(unique(msia_sf$ADM2_EN))
print(n_locations_df)
[1] 128
print(n_locations_sf)
[1] 144

In fact it is true, so lets change the msia_sf to only include those in msia_df

msia_sf_filtered <- msia_sf %>%
  semi_join(msia_df, by = "ADM2_EN")

n_locations_df <- length(unique(msia_df$ADM2_EN))
n_locations_sf <- length(unique(msia_sf_filtered$ADM2_EN))
print(n_locations_df)
[1] 128
print(n_locations_sf)
[1] 128

Create the spacetime object again

msia_spt <- spacetime(msia_df, msia_sf_filtered,
                 .loc_col = "ADM2_EN",
                 .time_col = "year")
is_spacetime_cube(msia_spt)
[1] TRUE

We will perform EHSA analysis by using emerging_hotspot_analysis() of sfdep package. It takes a spacetime object msia_spt, and the name of the variable of interest crimes for .var argument.

ehsa <- emerging_hotspot_analysis(
  x = msia_spt, 
  .var = "crimes", 
  k = 1, 
  nsim = 99
)
Warning in spdep::poly2nb(geometry, queen = queen, ...): some observations have no neighbours;
if this seems unexpected, try increasing the snap argument.
Warning in spdep::poly2nb(geometry, queen = queen, ...): neighbour object has 4 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.

We can then join them together

ehsa_sf <- left_join(msia_sf, ehsa, by = c("ADM2_EN" = "location"))

Visualisation with Bar Graph

ggplot(data = ehsa,
       aes(y = classification,fill = classification)) +
  geom_bar(show.legend = FALSE)

Visualising Distribution of EHSA (85% Confidence)

EHSA_sig <- ehsa_sf %>%
  filter(p_value < 0.15) 
tm_shape(ehsa_sf) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(EHSA_sig) +
  tm_fill(col = "classification", title = "Classification") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "EHSA (>85%)", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Testing for Shiny App

All of the code above works well for getting the basics of Global and Local Spatial Autocorrelation and Emerging Hot Spot Analysis however when it comes to a Shiny application. The input is more dynamic as the user is able to choose his/her own inputs to alter the map, to make their own interpretation. Also the code takes awhile to run due to the size of the dataset to be used.

Filtering for Global and Local Spatial Correlation

Global Spatial

As Shiny Application is interactive and user chooses the input, it is essentially applying a filter on the dataset.

For msia we have

  • ADM2_EN: we can limit this to all or have the user to be able to select a mixture of districts, however it would not really make sense, given how spatial correlation is about clustering

  • category: there is only assault or property. So the user can select either or, or both

  • type: to show based on what category is chosen. User is able to select all or a mixture

  • year: For this it would be best to limit the user to only select 1 year or a max of 2 years. Given how long it takes to run the code for 4 years already

  • p_ii: This can also be adjusted in the form of a slider to show the confidence level where 95% confidence would be < 0.05 and 85% confidence is < 0.15. To show strong evidence.

Something of what the filter would look like

#in this case no filter is applied for ADM2_EN, it already contains all
ADM2_EN_to_select <- c("all")
category_to_select <- c("assault")
type_to_select <- c("causing_injury")
year_to_select <- c("2019")

however for this i would like to only have the year filtered to have the worst case scenario for running time

msia_filtered <- msia %>% filter(year %in% year_to_select, )

Afterwards we can perform our normal global and local

msia_filtered_nb_q <- st_contiguity(msia_filtered, queen=TRUE)

This significantly cuts down the timing, but its still way longer than what a user would be willing to wait for. This would run for about 45 secs.

Limiting the category selection to either or might help it.

msia_filtered <- msia %>% filter(year %in% year_to_select, category %in% category_to_select )
msia_filtered_nb_q <- st_contiguity(msia_filtered, queen=TRUE)
Warning in spdep::poly2nb(geometry, queen = queen, ...): neighbour object has 4 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.

This cuts down the timing by quite a fair bit but will it affect the output

msia_filtered_wm_rs <- st_weights(msia_filtered_nb_q, style="W")
wm_q_filtered <- msia_filtered %>%
  mutate(nb = msia_filtered_nb_q,
         wt = msia_filtered_wm_rs,
         .before = 1) 
global_moran_test(wm_q_filtered$crimes,
                  wm_q_filtered$nb,
                  wm_q_filtered$wt,
                  zero.policy = TRUE)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 19.464, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     1.391760e-01     -9.794319e-04      5.185241e-05 

From the test, we can still see a positive moran’s I statistic suggests that there is clustering, or a degree of spatial autocorrelation.

We can also see that the P-value is small.

We can run monte-carlo simulation as well

global_moran_perm(wm_q_filtered$crimes,
           wm_q_filtered$nb,
           wm_q_filtered$wt,
           zero.policy = TRUE,
           nsim = 999,
           na.action=na.omit)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 1000 

statistic = 0.13918, observed rank = 1000, p-value < 2.2e-16
alternative hypothesis: two.sided

The values are about the same

Local Spatial

We can do Local Moran I as well

wm_q_filtered <- wm_q_filtered %>%
          mutate(local_moran = local_moran(
            crimes, nb, wt, nsim = 999, zero.policy=TRUE),
                 .before = 1) %>%
          unnest(local_moran)

So the above works with our filters applied. However the thing to consider is to whether to allow the user to be able select both category or a single one. As having a single category cuts down the processing time by a lot. But having more data also helps to portray the big picture better. But at the cost of time but not by a lot. It would still run it in under 1 minute.

Testing different types of plots

Different map arrangement

lisa <- wm_q_filtered
map1 <- tm_shape(lisa) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of No of crimes",
            main.title.size = 0.8)

map2 <- tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)
Variable(s) "ii" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
          labels = c("< 0.001", "< 0.01", "< 0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_text("u_rate", size = 0.4, col = "black") +  # Adding `p_value_2` as a text label
  tm_layout(main.title = "p-values of Two Variables",
            main.title.size = 0.8)

Having this plot is one way to display the p_ii with the u_rate to see if there are any relation to unemployment rate and the high clustering.

lisa <- wm_q_filtered
map1 <- tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
          labels = c("< 0.001", "< 0.01", "< 0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "p-value of local Moran's I of No of crimes",
            main.title.size = 0.8)

map2 <- tm_shape(lisa) +
  tm_fill("u_rate") + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "unemployment rate",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 1)

Having 2 Maps each filled with p_ii and u_rate and arranging them top and bottom might be easier to read

lisa <- wm_q_filtered
map1 <- tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
          labels = c("< 0.001", "< 0.01", "< 0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "p-value of local Moran's I of No of crimes",
            main.title.size = 0.8)

map2 <- tm_shape(lisa) +
  tm_polygons() +
  tm_text("u_rate", size = 0.4, col = "black") +
  tm_layout(main.title = "unemployment rate",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 1)

Having the text itself seems to be a better choice as its clear as to what is the actual value. But having this arrangement of top and bottom doesn’t seem to be best. Ultimately it might be the first plot with some adjustment of the text that seems to be the best

LISA Map

lisa <- wm_q_filtered
lisa_sig <- lisa %>% filter(p_ii_sim < 0.05)
tm_shape(lisa) +
  tm_polygons() + 
  tm_borders(alpha = 0.5) + 
  tm_shape(lisa_sig) + 
  tm_fill("mean", title = "LISA class") +
  tm_text("u_rate", size = 0.5, col = "black") +
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "LISA map of crimes", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Testing with different figure size

lisa <- wm_q_filtered
lisa_sig <- lisa %>% filter(p_ii_sim < 0.05)
tm_shape(lisa) +
  tm_polygons() + 
  tm_borders(alpha = 0.5) + 
  tm_shape(lisa_sig) + 
  tm_fill("mean", title = "LISA class") +
  tm_text("u_rate", size = 0.6, col = "black") +
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "LISA map of crimes", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

lisa <- wm_q_filtered
lisa_sig <- lisa %>% filter(p_ii_sim < 0.05)
tm_shape(lisa) +
  tm_polygons() + 
  tm_borders(alpha = 0.5) + 
  tm_shape(lisa_sig) + 
  tm_fill("mean", title = "LISA class") +
  tm_text("u_rate", size = 0.8, col = "black") +
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "LISA map of crimes", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Testing interactive maps

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(lisa) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
          labels = c("< 0.001", "< 0.01", "< 0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_text("u_rate", size = 0.6, col = "black") +  # Adding `p_value_2` as a text label
  tm_layout(main.title = "p-values of Two Variables",
            main.title.size = 0.8)
tmap_mode("plot")
tmap mode set to plotting

Filtering for Emerging Hot Spot Analysis

For this we would only be able to filter

  • Category: For this there would only be two options, selecting only either one

  • Type: User would only be able to select one based on the category they have chosen

Example of what the filter would look like

category_to_select <- c(
  "property"
)
type_to_select <- c(
  "break_in"
)

We then repeat the steps like we did previously

msia_df <- msia %>% filter(category %in% category_to_select, type %in% type_to_select )
msia_df <- msia_df %>%
  select(year, crimes, ADM2_EN) %>%
  st_drop_geometry()
msia_sf_filtered <- msia_sf %>%
  semi_join(msia_df, by = "ADM2_EN")
msia_df <- msia_df %>%
    group_by(year, ADM2_EN) %>%
    summarise(crimes = mean(crimes, na.rm = TRUE)) %>%
    ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
check_duplicates <- msia_df %>%
    group_by(year, ADM2_EN) %>%
    filter(n() > 1)
msia_spt <- spacetime(msia_df, msia_sf_filtered,
                 .loc_col = "ADM2_EN",
                 .time_col = "year")
is_spacetime_cube(msia_spt)
[1] TRUE
ehsa <- emerging_hotspot_analysis(
  x = msia_spt, 
  .var = "crimes", 
  k = 1, 
  nsim = 99
)
Warning in spdep::poly2nb(geometry, queen = queen, ...): some observations have no neighbours;
if this seems unexpected, try increasing the snap argument.
Warning in spdep::poly2nb(geometry, queen = queen, ...): neighbour object has 4 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.
ehsa_sf <- left_join(msia_sf, ehsa, by = c("ADM2_EN" = "location"))
EHSA_sig <- ehsa_sf %>%
  filter(p_value < 0.15) 
tm_shape(ehsa_sf) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  tm_shape(EHSA_sig) +
  tm_fill(col = "classification", title = "Classification") + 
  tm_borders(alpha = 0.4) +
  tm_layout(main.title = "EHSA (>85%)", main.title.size = 1)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

Saving files to RDS

For now one of the things that is useful for us to save to rds would be the msia object as that requires a lot of pre processing

write_rds(msia, "data/rds/msia.rds")

Proposed Shiny Application

The parameters in this prototype view is based on what was tested in the Testing for Shiny App section. More may be added as deemed necessary or removed during development.